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The Moon Shot of our Generation 




o Every generation has its signature "big 
science" projects 

o Ours are the collection and analysis of 
the genomes of a whole range of 
organisms 

o Being able to do so can provide a 
platform for solving some of the key 
problems of our age 

o ...and... 
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Hackers and the genome 



The analysis of the genome 
data is a deeply collaborative 
activity 

Never before has a "big 
science" project been so 
open to widespread 
participation 




Photo by bre pettis: http://www.flickr.com/photos/bre/3230258762/ 



Reverse engineering a genome ... hack the 
protocols 



Network protocols 



Device wiring 



Design Drawings 




Cell-Cell Signaling 



Intracellular Pathways 



Genome DNA Sequences 



Collaboration outside of the life sciences 
has contributed a lot to theoretical biology 
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Why? Emerging Diseases. 




o Better antibiotics for new diseases require 
new understanding 

Compare the genomes of people with... 
...the microbial genomes 

Differences can be exploited with new or 
existing pharmaceuticals 



wikimedia 



Why? Energy. 



Genomes of microorganisms 

More efficient production of ethanol, 
methane or even butanol 

Genomes of algae 

Increase in oil production 
Improve yield 

Produce ethanol or butanol directly? 
...even engineer for light production? 




photo by tochis: http://tochismochis.blogspot.com/ 



Why? Cancer. 




o Finding Better Anti-Cancer Drugs 

Comparing the mutations in cancer to 
healthy cells 

Investigating the outcomes of 
treatments 

Correlating normally-occurring 
differences in genes with risk 



wikimedia 



Why? Food 




o Improve plant resistance to 
drought, pathogens, heat 

o Improve yield 

o Adapt for different growing 
conditions 
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Finally - Sequenced DNA 
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http://www.cultivate-int.org/issue8/oriel/index.html 



Why? 



o We are obtaining biological data at a steadily-increasing rate - 
an exponential curve steeply tilting to vertical 

o Converting that data to usable information is a process that is 
proceeding, albeit not completely keeping up with its 
acquisition 

o Leveraging all that information to create knowledge is an open 
challenge, lagging far behind our rate of data collection 

-> Unique opportunities abound in creating an environment that 
enables true understanding of this rich sea of data 
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Biology in a nutshell 



o What follows is the fastest and most incomplete biology lesson 
ever. 

o If you know all of this already, bear with me 

o If you don't, hold on... 



Biological Information: 
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patient by a physician 
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From: http://www.cerezyme.com/patient/about/cz_pt_about-understanding-1.gif 



Your DNAand you 




Your DNA: 

Contains the entire plan (more 
or less) for you 

Compressed, about the same 
size as a DVD (3.2e9 bases, in 
1 ASCII byte per base) 

Differs about 1 every thousand 
bases from the next person 



DNA contains coded information at many 
layers... 
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Transcription 



Translation 




DNA DNA 



RNA 



http://www.swbic.org/education/comp-bio/images/lc.gii 
http://en.wikipedia.org/wiki/Protein_structure 
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Amino Acids 



Alpha helix 



Secondary protein slnucturc 
occurs whan ma saquance of amino acids 
are linked by hydrogen bonds 
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... leading to fully formed and operational 
structures 
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How can we turn this biology into a 
computable form? 
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http://bioinformatics.ubc.ca/about/what_is_bioinformatics/ 



Like music, DNA is a structured and layered 
string of data... 
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Not too different.. 
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Alcohol Dehydrogenase 



ABCDEFGHI JKLMNOPQRSTUVWXYZ 



But the data looks more like this: 



CDS 81. .1205 

/gene="ADH5" 

/gene_synonym="FDH" 

/gene_synonym="ADHX" 

/gene_synonym="ADH-3" 

/gene_synonym= M GSNOR" 

/EC_number=" 1.1.1.1" 

/codon_start=l 

/product="alcohol dehydrogenase 5, chi polypeptide" / 
protein_id="NP_000662.3" /db_xref="GI : 71565154" /db_xref ="GeneID: 128 
/db_xref="HGNC:253" /db_xref="HPRD : 00064" /db_xref="MIM: 103710" 
/ trans lation="MANEVIKCKAAVAWEAGKPLSIEEIEVAPPKAHEVRIKIIATAV 
CHTDAYTLSGADPEGCFPVILGHEGAGIVESVGEGVTKLKAGDTVIPLYIPQCGECKF 
CLNPKTNLCQKIRVTQGKGLMPDGTSRFTCKGKTILHYMGTSTFSEYTVVADISVAKI 
DPLAPLDKVCLLGCGISTGYGAAVNTAKLEPGSVCAVFGLGGVGLAVIMGCKVAGASR 
IIGVDINKDKFARAKEFGATECINPQDFSKPIQEVLIEMTDGGVDYSFECIGNVKVMR 
AALEACHKGWGVSVVVGVAASGEEIATRPFQLVTGRTWKGTAFGGWKSVESVPKLVSE 
YMSKKIKVDEFVTHNLSFDEINKAFELMHSGKSIRTVVKI" 



And in XML: 



<?xml version="1.0"?> 

<!DOCTYPE TSeq PUBLIC "-//NCBI//NCBI TSeq/EN" 

"http : / /www . ncbi . nlm . nih . gov/dtd/NCBI_TSeq . dtd"> 

<TSeq> 

<TSeq_seqtype value="nucleotide"/> 

<TSeq_gi>715 65153</TSeq_gi> 

<TSeq_accver>NM_0 671 . 3</TSeq_accver> 

<TSeq_taxid>960 6</TSeq_taxid> 

<TSeq_orgname>Homo sapiens</TSeq_orgname> 

<TSeq_def line>Homo sapiens alcohol dehydrogenase 5 (class III), chi 
polypeptide (ADH5) , mRNA</TSeq_def line> 

<TSeq length>2644</TSeq length> 



<TSeq_sequence>GCGCTCGCCACGCCCATGCCTCCGTCGCTGCGCGGCCCACCCCGGATGTCAGCCCCCC 
GCGCCGACCAGA 

CTGCAGTTGCTTGGGAGGCTGGAAAGCCTCTCTCCATAGAGGAGATAGAG</TSeq_sequence> 
</TSeq> 
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What about access to tools and data? 




o You need a couple of things to get started: 

Data like DNA sequences, Protein 

Structures, and Biological outcomes are 
needed for the analysis 

o Tools for managing, analyzing and 
visualizing that data: 

Everything from databases to statistics 
software. 



Photo by mandolux: http://www.flickr.com/photos/mandolux/438145176/ 



http://www.ncbi. nlm.nih.gov/sites/entrez?db=nuccore 



Some sources: Entrez 
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The Entrez Nucleotide database is a collection of sequences from several 

sources, including GenBank : RefSeq : and PDB. The number of bases in 

these databases continues to grow at an exponential rate. 
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Human Genome 



Explore human genome resources or browse the human genome 
sequence using the Map Viewer . 



MyNCBI (Cubby) 
Related resources 

BUST 



Reference sequence 
project 

Search for Genes 

Submit to GenBank 



Building the human genome 

The Human Genome Reference DKA Sequence was completed in April 
2003. The current version is listed as a build number on the Genome View 
page and includes an accompanying set of statistics and release notes. 

Homo Sapiens (hUm^n) genome View BLAST search the human genome 
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http://genome.ucsc.edu/cgi-bin/hgTracks?org=human 



Some sources: UCSC "Golden Path" 
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Vertebrate Multiz Alignment & Conservation £44 Species) 
Flacental Mammal Basewise Conservation by FhyloF 
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http://www.rcsb.org/pdb/home/home.do 



Some Sources: PDB 
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When exploring a structure, 
select Structure Analysis and 
then Geometry from the left 
menu to view a 
Ramachandran Plot. 
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Images and Visualization 



INTERDOMAIN MOTION IN LIVER ALCOHOL 
Title DEHYDROGENASE. STRUCTURAL AND ENERGETIC 
ANALYSIS OF THE HINGE BENDING MODE 



Authors Eklund ' H ' Jones * TA - 
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Interdomain motion in liver alcohol dehydrogenase. 
Structural and energetic analysis of the hinge bending mode. 
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Some tools are extensions of conventional 
programming languages: 
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Some tools are web-based: 
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Welcome to Genc^Jtterri 



Analysing genomic data in GenePatt&rn 
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Ne-cf-nt Jobs 

H*juta to dilOliy 
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wbat do you want to dG? 

Protocol* lor running common an-jlv^cs ir> GcncPotlcm: 

DS1Ter*rifci*i Expression Analysis fl 
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Prediction,* 
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And some are local applications: 



geWorkbench 1.0 - An Open Extensible Platform for Biomedical Informatics 
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File Edit View Commands Tools Help 



* 5ynteny "' Microarray Viewer \ 

Reverse Engineering Scatter Plot caBIO Pathways Tabular Microarray Viewer ^ Sequence Retriever ^ 
Marker Annotations Color Mosaic Expression Profiles Expression Value Distribution Gene Ontology \ 



>" _ X 



Project Folders 



m§ Workspace 
S-& Project 

ffl webmatrix.exp 



Markers "' Arrays/Phenotypes \ 




Search 



CB26-2 

CBS 11 

CBS 12 

CB1171 

CB1193 

N4-13 

N4-14 



Array/Phenotype Sets 



Default 



New 
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D^ Selection [0] 



%^Case ^Control %^Test ^Ignore 
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H All Arrays J All Markers CB26-2 



+Intensity 



Array 



Normalization \ 

Analysis \ Dataset Annotation Filtering \ Dataset History Experiment Info Synteny Parameters \ 

*ll 

Marker-based centering 
Mean-variance normalizer 
Array-based centering 




Centering Parameters 



Even tools for visualizing 3D protein 
structures are available 



[;> Enactor invocation 



< > Save as XML |W I Save to disk fWI Save to disk as website iWl Excel 



Status 



Results | Process report 



pdbFlatFile 



text/frlain .chemicalft-pdb .text/htrni 

Ctick to view... 

urn:lsid: www. rnygrid.org .uk:lsdoci 





o Background and Inspiration 

o Biology Lesson 

o DNAasData 

o The Toolbox 

o An Example 

o Interlude on DNA Computing 

o How to get started 



An example of what you can do: a man-in- 
the middle attack on DNA control? 



photo by: Ben Stanfield http://flickr.com/photos/acaben/2816139/ 



The tac Qperon and its Control Elements 
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RNA Polymerase 
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http://en.wikipedia.0rg/wiki/File:Lac_operon.png 



DashPat - look for common, recurring 
words and their placement... 



Hashed wocd: AAD 

Locations: 0,1,17 

Sequence: lAAH hAH KSDILRWAPCG AAH I 



Primary Tab Ic 



Seco ndary Table 
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...possibly identifying "words" that are 
statistically under- and over- represented 




Inhibition of Apoptoaia 
pffOllrftr^liQn, growth 
factor irKfapendvnc* 



Can be associated with genetic 
"switches" upstream of known 
genes in a genome 

Correlate the identified genes in 
a known "pathway" associated 
with metabolism 

Next step would be to test the 
hypothesis in a laboratory 



o Background and Inspiration 

o Biology Lesson 

o DNAasData 

o The Toolbox 

o An Example 

o Interlude on DNA Computing 

o How to get started 



Working the other way, too- Interlude on 
DNA computing... 







20-mer ol^ortucfeotid& represent ng cities 



GCTATTCGAGCTTAAAGCTA 



QGCTAGGTACCAGCATQCTT 



2G-mer oligonucleotide repjeseriting 
paths bulv- i>::r- cities 

®-(3) s : Hamwwa 3' 

DNAfflpnesartiation ol th& path from City 2 -> city 3 -> City 4 



GCTATTCGAGCTTAAAGCTAGGCTAGGTAC 
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.- 



Translating the problem space to biological 
molecules and processes... 



Molecular finite automaton 



► Input molecule (DMA sequence) 
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Terminator 
(final state) 



a b 
Endonuclease 
► Ligase 

Various transition molecules {with sticky ends) 



Output detector molecules 



Turing machine 

► Tfrpe aabbaaabaaaha 

(With symbol or blank in each square) 



► Tapehead can do three transitions/actions 

depending on Its transition rules and the 

symbol/blank 

jjmi ► Move left or right along tape 

► Read/write, erase or leave symbol 

► Change status 

* Transition rules or action table 

ExamplE 



Status Symbol Transition rules 


Status 
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Move right, change to status 1 
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Move right, ^ 
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...shows that computing can be done with 
DNA 



Initial status 



Initial status 
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Output detector 
molecule 



^ t^_^x Endonuclease 
Ligase 



Output reporter 
Accepted final status 

(Design ol a molecular finite automaton Benensan ef a/., 
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rules 
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rules 



Not limited to science alone - there are 
interesting artistic opportunities, too! 



National Center for Biotechnology Information 

NatMinl ].il*tar\ ul Medicine Nutlwial hp.tlulis.il tk'jHh 



PjbMed All Databases BLAST OMIM Books 



Search \ All Databases 
I All Databases 
1 NCBi Web Site 

J PubMed 

1 Protein 
J Nucleotide ^ 

2 Structure 
I Genome 
1 Books 

I CancerCnromosomes 
' Conserved Domains 
• 3D Domains 

I Genome Project 
j GEN SAT 
" CEO Profiles 
I GEO Datasets 
i HomotoGene 
I Journals 
■ MeSH 



]z\ for |zea mays alpha amylase 
,rJ 



in 1988 as a national resource for ► Assembly Archive 
iology information, NCBI creates 
3ases, conducts research in ► Clusters of 

i a I bi o I ogy , d e vel op s softw a re orthdoooua groups 

jlyzing genome data, and 

i A ~,rf; m i i n c mm f,- M «n w * Coffee Break. 



I Retrieve Sequence 
| (ncicb ENTREZ) 



>he*o1jrpe inform a tio* that will htlp 
e link between genes and disease. For 



► Gene expression 

nmnlMi /f^prn 




Custom pytho 



urogram 
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ython-MIDI 



Rhythm MIDI 
File 



Melody MIDI 
File 



Harmony MIDI 
File 
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o Background and Inspiration 

o Biology Lesson 

o DNAasData 

o The Toolbox 

o An Example 

o Interlude on DNA Computing 

o How to get started 



Sound interesting? How to get started... 



Find a buddy - a biology expert to help you understand the 
underlying science and highlight interesting/important 
problems 

Take some classes - everything from basic biology to 
advanced bioinformatics are available online, often by major 
players in the field and often free 

Communicate with your peers - there are all kinds of 
opportunities to work with others in bioinformatics- from the 
conventional (publish and present) to virtual activities like IRC, 
forums, social networking, etc. 

Stay current with the research - many journals are available 
for reading and review, some of the best are open and freely 
available 



A sampling of online classes: 
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o MIT OpenCourseware Examples 

(See: http://ocw.mit.edu/OcwWeb/web/courses/courses/index.htm) 

6.092 Bioinformatics and Proteomics 

6.895 / 6.095J Computational Biology: 
Genomes, Networks, Evolution 

6.096 Algorithms for Computational 
Biology 

o Stanford Bioinformatics Online 

(See: http://motif.stanford.edu/courses.html/) 

Biochem 218 Computational Molecular 
Biology 

Biochem 238 Computational Proteomics 

Biochem 228 Computational Genomic 
Biology 



Some (virtual) places to meet and 
communicate 



experiment 1 



myExperiment makes it really easy to find use and share 
scientific workflows and other files and to build communities 



1 1 All v] | Search 



Use myExperiment to... 
^f Find Workflows 
l5 Find Files 

@ Share Your Workflows and Files 
(Q Create and Find Packs of Items 
fp Create and Join Groups 
<Q, Find People and Make Friends 
i~l Send Messages 
ff Get Feedback 

Tag and Rate things 
C Write Reviews and Comments 
W- Build your Profile and Reputation 



myExperiment has over 



Register 



>SciLink«,«. 




Iv^l 3 new messages 



Profile Progress 



83.0% Complete 
Upload some citations to the 
citation manager in your 
profile. Let other's know what 
you're reading! 




Today 

:e*- =a:e created the E i ci-c- : ' >; :a Rzzz-z* - El 'P. Group 



Find Podcasts, Videos & Blogs. Official 
Source for Cisco Top Neve] 



This Week 

: v ■ :-- '-:' created the : ::e-:* :■- izi-zi a" :ec- ~c cgie Group 



Find Biotech Jobs 

Find the Perfect Biotech Job by 
Networking for Free on PartnerUp. 



Down to a Scienct 

Science Neve that im 
Science Blog you nes 



Join E pern ic US (membership i&fr«#] 

Epemicus is open to current and former 
research scientists. 

First name: | | 



The Shortest Path to People and Expertise In Your Scientific Network 
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CREATE YOUR SCIENTIFIC PROFILE 



RK Narayanan, Ph.D. 



Molecular Me: : e~ Feb 2-. 2CCI - Feb 2". 2i:i 

ASAP Alliance Summit 2009 Feb 9, 2009 - Feb 12, 2009 

Molecular MedicineTri-Conference 2O0S Feb 24, 2009- Feb 27, 2009 



Last name: |_ 
Email: 



: 



: 



(we donl share your email) 




Postdoctoral Associate 

Picower Institute for Learning and Memory 

Brain and Cognitive Sciences 

Massachusetts Institute of Technology (Cambridge. MA) 

RKhas a public profile 
http://www.epernicus.com/rn 



Already a Member? Sign In. 



Latest on Epemicus Blog 

Reflections on 2003 
December 30, 2005 



@ CONNECT WITH CURRENT AND FORMER COLLEAGUES 

Most represented institutions: | | | i^Onf 

Most represented labs: Robert S Langer, Jeffrey L Bennetzen 

(Slfe FIND PEOPLE AND EXPERTISE IN YOUR NETWORK 



Recently added assets: 



HEK293 Cells. Li.e Dea:; 3SS3 ; 
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Stay Current with journal articles... 




PubMed (and freely-available PubMed 
Central articles) 

Freely-available journals at Public 
Library of Science and elsewhere 

Your local University library 




photo by sifter: http://www.flickr.com/photos/sifter/370775225/ 



[BMC I 

Bioinformatics 

COMPUTATIONAL 
BIOLOGY 

An official journal of the International Society for Computational Biology 



PIPS 



Now... 



Get out there and hack the genome! 



